DCTPep, the data of cancer therapy peptides

With the discovery of the therapeutic activity of peptides, they have emerged as a promising class of anti-cancer agents due to their specific targeting, low toxicity, and potential for high selectivity. In particular, as peptide-drug conjugates enter clinical, the coupling of targeted peptides with traditional chemotherapy drugs or cytotoxic agents will become a new direction in cancer treatment. To facilitate the drug development of cancer therapy peptides, we have constructed DCTPep, a novel, open, and comprehensive database for cancer therapy peptides. In addition to traditional anticancer peptides (ACPs), the peptide library also includes peptides related to cancer therapy. These data were collected manually from published research articles, patents, and other protein or peptide databases. Data on drug library include clinically investigated and/or approved peptide drugs related to cancer therapy, which mainly come from the portal websites of drug regulatory authorities and organisations in different countries and regions. DCTPep has a total of 6214 entries, we believe that DCTPep will contribute to the design and screening of future cancer therapy peptides.

with chemotherapy drugs or cytotoxic agents through a linker, such as antibody-drug conjugates (ADCs) and peptide-drug conjugates (PDCs) 24 .Currently, the most common drug conjugates used in cancer treatment in clinical practice are ADCs.However, with the increasing presence of peptides in clinical, PDCs has also emerged.PDCs have the potential to overcome the limitations of ADCs, such as smaller molecular weight and ease of synthesis 25 .Nowadays, only two PDCs, 177 Lu-dotatate (DCTPepD0013) and Melflufen (DCTPepD0108), have been approved for clinical cancer treatment, of which Melflufen being withdrawn from the market by the FDA.However, there are still many PDCs in cancer clinical development or about to enter clinical trials.The potential of PDCs cannot be ignored.Peptides play a crucial role as carriers in PDCs.Therefore, DCTPep not only focuses on collecting ACPs but also emphasizes the collection of cancer targeted peptides.The carrier peptides in PDCs include cell-penetrating peptides (CPP) and cell-targeting peptides (CTP) 26 .The classification field in the database also follows a similar category, including cell-penetrating peptides, cancer-targeting peptides, and targeted peptide conjugates.
Figure 1 and Table 1 presents the comparative results of DCTPep datasets with ACP datasets in other peptide databases.Compared to DBAASP, CancerPPD and SATPdb, DCTPep possesses over 3000 unique entries.DCTPep provides a vast amount of cancer therapy peptide data, including clinically relevant peptide drugs curated in the drug library, filling the gaps in existing data and offering assistance in the design and screening of novel cancer therapeutic peptides.Particularly, the targeted peptide data will offer more options for PDC design.In order to better understand the mechanism of action of cancer therapy peptides, we have added target annotations and collected over 60 targets for these peptides that are not included in other ACPs databases.The dataset is freely available to all via the web without the need to login or registration and is not password protected.We believe that DCTPep will become a valuable resource for the development of novel bioactive peptides, particularly in the field of cancer therapeutics.

Methods
Data collection and compilation.In order to develop DCTPep, extensive searches were conducted on published articles, patents, and public databases.The data of DCTPep was stored in two sub libraries: peptide library and drug library.The inclusion criteria for the peptide library in the DCTPep were as follows: 1.The sequence of amino acids is reported; 2. Mature peptide sequences without precursor and signal regions; 3. The length of the sequence does not exceed 100 amino acids; 4. Peptides that exhibit anticancer/antitumor activity or target specific molecules/biomarkers overexpressed in cancer cells; 5. Cell-penetrating peptides that can  enhance the delivery of drugs into cancer cells.The inclusion criteria for drug library were similar to those for peptide library: 1. Peptides and their derivatives or amino acid derivatives related to cancer treatment; 2. Entered clinical research or approved by FDA, EMA or HC.
To collect peptide data, keywords were used to search in academic search engines such as Google Scholar, Web of Science, PubMed, and Google Patents.The keywords included "ACP", "antiangiogenic peptides", "cancer therapy peptide", "cancer targeted peptide", and "peptide conjugates".After collecting research papers, patents, and clinical research literature, data were manually extracted.In addition to manually extracting information of cancer therapy peptide from literature, also included other information related to peptides (such as three-dimensional structures) in UniProt 27 , PDB 28 , and other databases.The physicochemical information of peptides is calculated using Expasy Protparam server (https://web.expasy.org/protparam/,accessed on March 2024) and SciDBMaker 29 .
The data of drug library mainly originated from the portal websites of drug regulatory authorities and organisations in several countries and regions.In addition, it was supplemented by the drug databases DrugBank 30 , PubChem 31 , NCI Thesaurus 32 and Global Substance Registration System (GSRS) 33 .By entering keywords such as "peptides and their derivatives", "amino acids and their derivatives", and "anticancer" into the aforementioned website or database, relevant information can be found.
structural prediction and evaluation.Due to the difficulties in experimental determination of peptide and protein structures, most of the peptides lack experimental determined structures.AlphaFold 34 was used to predict the potential 3D structures of DCTPep peptides.Default structure parameters for AlphaFold prediction were used: peptide was modeled as a monomer; Multiple sequence alignment (MSA) information databases: full_dbs (all gene databases) 34 .Each peptide generates 5 structures, and the structure with the highest score is selected based on predicted local distance difference test (pLDDT) 34 .To evaluate the reliability of AlphaFold predicted peptide structures, 30 peptides with experimental determined structures were selected and their structures were predicted by AlphaFold.The differences between predicted structure and experimentally determined structure were calculated by Root-Mean-Square Deviation (RMSD) 35 .Given two conformations, α and β of N residues, let r α and r β be the respective coordinates of their residues at position i, for 1, …, N. RMSD between α and β as Eq. ( 1):

RMSD
Where Q is the unitary rotation matrix that optimally aligns the vectors.Disulfide bonds are also considered to see if AlphaFold can correctly predict the disulfide bonds.Whatcheck 36

N/C-terminal Modification
The modifications of N/C-terminal according to the references.
Other Modification Special amino acids (out of 20 common amino acids).

Chiral
The L/D amino acid composition of peptides.
Physicochemical Information Formula, mass, pI, Net charge and other information, calculated by Protparam and SciDBMaker.

Literature Information
The information of peptides come from all kinds of papers or patents, and the section provides the way to find the full text.

Link
Corresponding link to other peptide databases.
Table 2. Peptide library data annotation field list.

General information
DCTPepD ID Identification code for DCTPepD drug library, the field provides the unique accessing number linking to the corresponding DCTPepD entry.
Active Ingredients Active pharmaceutical ingredient.Substance in which the drug actually works.

Description
Drug description.Derived from descriptions in NCI or literature sources.

Synonyms
Other names of drug.

Disease
Applicable diseases.
Classification Drug Categories.

Structure information
Molecular Formula, Molecular Weight, Active Sequence, Sequence Length, Modification, and other structure information.

External Codes
External identification code, also provides the accessing link to PubChem, DrugBank, NCI Thesaurus and GSRS.

Drug indication
Stemmed from DrugBank or clinical trials.
Approved information Approved drug formulation information, sourced from Drugs@FDA, European Medicines Agency (EMA), and Health Canada.

Clinical information
Information sourced from ClinicalTrials.gov.
Table 3. Drug library data annotation field list.
the quality of the predicted structures.Whatcheck 36 evaluates multiple parameters such as bond lengths, bond angles, and torsion angles of the input structure.Procheck 37 assesses the stereochemical quality of the input structure and provides various graphical outputs.Ramachandran plot 38 is used to evaluate the rationality of the structure, where peptide bond dihedral angles Ψ(psi) and Φ(phi) combinations are expected to located in most favored regions and allowed regions (core regions) in the plot.Ideally, a protein structure should have over 90% dihedral angles Φ-Ψ of residues in these core regions 37 .

Data Records
The datasets of DCTPep are available at Figshare 39 and contains the following files: All_information (annotation information table for storing peptide library entries), peptideactivity (activity information annotation of peptide library entries), peptidedrug (annotation information table for storing active Ingredients of drug library entries), marketpeptide (approved drug preparations information annotation of drug library entries), clinicalpeptide (clinical peptide information annotation of drug library entries), peptide_library_all (peptide library data stored in Fasta format) and prediction pdb (compressed packets for storing predicted structures).The architecture of the DCTPep is shown in Fig. 2. DCTPep contains a total of 6214 peptide entries, of which 6106 are stored in the peptide library and 108 are stored in the drug library (DCTPepD), involving over 60 targets and over 380 cancer cell lines.
Table 2 displays detailed annotation information of the data in the peptide library.Each entry in the peptide library consists of the following sections: general information, activity information, structural information, physicochemical information, literature information, and links.The peptides in the peptide library included cancer therapeutic peptides such as traditional ACP and cancer targeted peptides.Low cytotoxicity and hemolytic activity are also important criteria for developing peptide-based drugs.Therefore, in addition to anticancer activity and targets, activity information also includes cytotoxicity and hemolytic activity.All annotation information is manually extracted from the literature, and corresponding paper or patent source information is provided.The physicochemical information is calculated by Protparam and SciDBMaker 29   peptide, the emphasis of the information recorded in different databases may vary.Therefore, DCTPep provides corresponding peptide entry codes in other peptide databases.
The data in the drug library includes peptide drugs that have been approved or are in clinical research stage.Table 3 shows detailed annotation information for drug library data.Each entry consists of four sections: general information, structural information, external codes, and drug approval.The external codes provide identification codes for drug entries in other public databases, allowing users to obtain more comprehensive information on related entries from other sources.Approved drug formulations and clinical information can be found in the drug approval section.A total of 28 approved anticancer peptide drugs and 80 peptides in various clinical trial stages are included in the drug library.

technical Validation
Alphafold demonstrated unprecedented accuracy in 14th Critical Assessment of protein Structure Prediction (CASP14) 34 .The study conducted by McDonald et al. 40 also indicated that AlphaFold can accurately predict peptides with α-helices, β-sheets, and rich in disulfide bonds.To evaluate the accuracy of AlphaFold, 30 ACPs with experimentally determined structures were predicted by AlphaFold.
Table 4 and Fig. 3 displays the comparison results between predicted structures and experimental structures, including RMSD and disulfide bond positions.The results indicate that the predicted structures have high accuracy.The deviations between the predicted and experimental structures are small, with an average of Cα (α-carbon atom) RMSD value is 1.621 Å.For structures containing disulfide bonds, AlphaFold can accurately predict the positions of the disulfide bonds.Some of the predicted structures of peptides can be directly obtained from the AlphaFold Protein Structure Database 41 , for example, AF-P82393-F1 (DCTPep00006) and AF-P80400-F1 (DCTPep00097).
pLDDT is an important parameter for assessing the confidence of predictions 34 .While using pLDDT alone to define the accuracy of predicted peptide structures may not be entirely accurate, it can still reflect their accuracy  to some extent.DCTPep integrates the Mol* Viewer 42 to display the predicted structures, where the pLDDT of each residue can be visualized in the structure 43 (Fig. 4).The quality assessment of the predicted structures was performed using Whatcheck 36 and Procheck 37 (Table 5), and the results indicate that the predicted structures are reliable.The average error rate of Whatcheck is 11.52%, which is at a relatively low level.In the Ramachandran plot generated by Procheck, the average core regions occupancy rate is 95.11%,only the DCTPep00623 has a low occupancy rate of core regions.The average disallowed regions occupancy rate is 0.26%, only DCTPep00267 has one residue present in the disallowed regions.These errors are within an acceptable range.

Fig. 1
Fig. 1 Venn diagram illustrating the numbers of overlapping and non-overlapping peptide sequences related to cancer therapy from the DCTPep, CancerPPD, SATPdb and DBAASP.

Fig. 2
Fig. 2 Architecture of the datasets in DCTPep.

Table 1 .
Comparison of peptides related to cancer therapy in DCTPep with other peptide databases (data as of 2023.12.20).
and Procheck 37 are used to assess

Table 4 .
. For the same Comparison between predicted structures and experimental structures.

Table 5 .
The results of predicted structures in Whatcheck and Procheck.